feat(test-fill,test-benchmark): organize fixtures into fork and gas-limit subdirs#2134
Conversation
635b120 to
070ec15
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## forks/amsterdam #2134 +/- ##
================================================
Coverage 85.85% 85.85%
================================================
Files 599 599
Lines 39428 39428
Branches 3776 3776
================================================
Hits 33851 33851
Misses 4946 4946
Partials 631 631
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
spencer-tb
left a comment
There was a problem hiding this comment.
Another alternative structure would be the following, which has the benefit of keeping the existing structure, the test types staying at the same dir level:
fixtures/
├── blockchain_tests
│ ├── gas_limit_0001M/...
│ └── gas_limit_0002M/...
├── blockchain_tests_engine
│ ├── gas_limit_0001M/...
│ └── gas_limit_0002M/...
└── blockchain_tests_engine_x
├── pre_alloc/
├── gas_limit_0001M/...
└── gas_limit_0002M/...
I don't see any other issues here. The hasher would still work. Consume enginex could need a tweak in the future but not being used atm.
Another point: are those using the benchmarking tests happy with the structure as it might be a breaking change on there end.
|
Thanks for the review @spencer-tb! Yes, I think this is a better structure! |
|
Done! Also flattened one-level in the structure so it's clean. Description updated with verbose before and after trees! |
|
I don't think we should make this change in benchmark release layouts without considering the corresponding change for forks. The linear growth of our new fixtures files with each new fork, is the real issue. Let's clear that up first and then come back to this PR (which could increase in scope?). I made a discussion with poll for that here: |
a5b22e4 to
9a73c4e
Compare
9a73c4e to
8f35355
Compare
|
Updated this PR and description to split fixtures by gas-limit AND fork. |
|
@spencer-tb @marioevz I've updated the PR following our discussion and decision from the STEEL Team meeting (ethsteel/pm#44 (comment)):
Please see the examples with screenshots in the updated PR description to see how the directory structure now looks. I tested the index quickly via |
marioevz
left a comment
There was a problem hiding this comment.
Thanks for implementing this. I think the implementation can be simpler and composable with the suggestion, but let me know what you think.
| gas_benchmark_values = GasBenchmarkValues.from_config( | ||
| request.config | ||
| ) | ||
| is_benchmark_test = any( | ||
| request.node.get_closest_marker(name) | ||
| for name in ("benchmark", "stateful", "repricing") | ||
| ) | ||
| if gas_benchmark_values is not None and is_benchmark_test: | ||
| gas_limit_subdir = format_gas_limit_subdir( | ||
| gas_benchmark_value, gas_benchmark_values.root | ||
| ) | ||
| output_subdir = Path( | ||
| format_fork_subdir(fork.name(), gas_limit_subdir) | ||
| ) | ||
| else: | ||
| output_subdir = Path(format_fork_subdir(fork.name())) |
There was a problem hiding this comment.
Iiuc this does not account for fixed-opcode-count case, i.e. running -m repricing --fixed-opcode-count 1,2 instead of -m benchmark --gas-benchmark-values 1,2.
I think we can improve this approach, in which we try to decode what some other plugin prepared or already decoded, with something more composable:
We could use a new marker, e.g. pytest.mark.fixture_subfolder, that gets added by the plugins in order to signal that the generated fixture for this test should be placed in a specific subfolder.
It could be added in these three places:
And then parsed by this function or the fixture collector.
We could add a level keyword argument to decide the order in which the prefix gets resolved.
I.e. pytest.mark.fixture_subfolder(level=0, prefix=f"for_{fork}") and pytest.mark.fixture_subfolder(level=1, prefix=f"{gas_benchmark_value//1_000_000}M"), or something similar.
There was a problem hiding this comment.
Thanks a lot for catching the --fixed-opcode-count case and the implementation idea; it's very clean.
I added new screenshots for all cases to the description, so you can easily check how it turned out.
Move FixtureOutput from filler/ to shared/ and absorb fixture_naming.py. This prepares for future build-tool plugins that also generate fixtures.
Always include the gas-limit suffix in the fork output subdirectory,
not just for benchmark tests. Non-benchmark (consensus) tests now also
get a subdirectory like `for_prague_at_0120M/` using the environment
default gas limit (respecting `--block-gas-limit`).
The `_at_` separator makes the directory name unambiguous:
for_{fork}_at_{gas}M (e.g. for_prague_at_0120M)
Add a "Fixture Output Directory Structure" section explaining the
for_{fork}_at_{gas}M subdirectory convention, default gas limit, and
how benchmark tests use --gas-benchmark-values.
Consensus fixtures now use `for_{fork}/` (e.g. `for_prague/`) instead of
`for_{fork}_at_{gas}M/`. Benchmark fixtures keep the gas limit suffix.
Replace manual gas-benchmark detection in the filler with a composable pytest.mark.fixture_subfolder(level, prefix) marker: - Add level-0 marker (fork prefix) in fork parametrization. - Add level-1 marker in GasBenchmarkValues and OpcodeCountsConfig. - Resolve output subdirs from markers in filler instead of hard-coding. - Break circular import so collector can share the separator constant. - Register the marker in execute_fill alongside other shared markers.
- Verify --fixed-opcode-count splits fixtures into per-count subdirs. - Mirror the existing gas-benchmark subdir test for opcode counts.
9822571 to
c21bd32
Compare
- Remove unused `should_auto_enable_all_formats` property. - Remove unused `format_gas_limit_subdir()` function. - Remove unreachable `FORK_SUBDIR_PREFIX` check in post-merge verification (format dir is always at `parts[0]`). - Tighten `resolve_fixture_subfolder` parameter from `list` to `list[pytest.Mark]`. - Clarify "benchmark" prefix stripping in collector with comment.
marioevz
left a comment
There was a problem hiding this comment.
LGTM, excellent work :D
Pushed a small commit to fix static tests.
commit a48c892 Author: Kevaundray Wedderburn <kevtheappdev@gmail.com> Date: Fri Feb 27 17:39:02 2026 +0000 introduce BlockDiff commit a4a4b47 Merge: e801c45 5034aa2 Author: Kevaundray Wedderburn <kevtheappdev@gmail.com> Date: Tue Feb 24 23:25:33 2026 +0000 Merge remote-tracking branch 'upstream/forks/amsterdam' into kw/extract-stateless-functionality commit e801c45 Author: Kevaundray Wedderburn <kevtheappdev@gmail.com> Date: Tue Feb 24 23:14:11 2026 +0000 add doc comments for ChainContext fields commit de68f63 Author: Kevaundray Wedderburn <kevtheappdev@gmail.com> Date: Tue Feb 24 22:57:06 2026 +0000 initial commit commit 5034aa2 Author: kevaundray <kevtheappdev@gmail.com> Date: Tue Feb 24 22:56:44 2026 +0000 chore(ci): Skip diffs if not on default branch (ethereum#2304) * skip diffs * Update .github/workflows/gh-pages.yaml Co-authored-by: Sam Wilson <57262657+SamWilsn@users.noreply.github.com> * revert: actionlint --------- Co-authored-by: Sam Wilson <57262657+SamWilsn@users.noreply.github.com> commit 7ca499e Author: danceratopz <danceratopz@gmail.com> Date: Tue Feb 24 20:06:22 2026 +0100 feat(test-fill,test-benchmark): organize fixtures into fork and gas-limit subdirs (ethereum#2134) * feat(test-benchmark): split gas benchmark outputs by subdir * docs(test-benchmark): document gas benchmark output layout * test(test-benchmark): verify gas benchmark subdir layout * feat(test-benchmark): update folder structure according to review * refactor(test-filler): move fixture_output to shared Move FixtureOutput from filler/ to shared/ and absorb fixture_naming.py. This prepares for future build-tool plugins that also generate fixtures. * feat(test-filler): organize fill fixture output by target fork Always include the gas-limit suffix in the fork output subdirectory, not just for benchmark tests. Non-benchmark (consensus) tests now also get a subdirectory like `for_prague_at_0120M/` using the environment default gas limit (respecting `--block-gas-limit`). The `_at_` separator makes the directory name unambiguous: for_{fork}_at_{gas}M (e.g. for_prague_at_0120M) * docs(releases): document fixture output directory layout Add a "Fixture Output Directory Structure" section explaining the for_{fork}_at_{gas}M subdirectory convention, default gas limit, and how benchmark tests use --gas-benchmark-values. * feat(test-filler): drop gas limit from consensus fixture subdirs Consensus fixtures now use `for_{fork}/` (e.g. `for_prague/`) instead of `for_{fork}_at_{gas}M/`. Benchmark fixtures keep the gas limit suffix. * feat(test-filler): use fixture_subfolder marker for output subdirs Replace manual gas-benchmark detection in the filler with a composable pytest.mark.fixture_subfolder(level, prefix) marker: - Add level-0 marker (fork prefix) in fork parametrization. - Add level-1 marker in GasBenchmarkValues and OpcodeCountsConfig. - Resolve output subdirs from markers in filler instead of hard-coding. - Break circular import so collector can share the separator constant. - Register the marker in execute_fill alongside other shared markers. * feat(test-tests): add fixed-opcode-count subdirectory output test - Verify --fixed-opcode-count splits fixtures into per-count subdirs. - Mirror the existing gas-benchmark subdir test for opcode counts. * fix(test-fill): fix-up typehints for mypy * fix(test-fill): remove dead code and tighten types - Remove unused `should_auto_enable_all_formats` property. - Remove unused `format_gas_limit_subdir()` function. - Remove unreachable `FORK_SUBDIR_PREFIX` check in post-merge verification (format dir is always at `parts[0]`). - Tighten `resolve_fixture_subfolder` parameter from `list` to `list[pytest.Mark]`. - Clarify "benchmark" prefix stripping in collector with comment. * docs(test-fill): update docs for fixed-opcode-count fixture dir layout * fix: static tests --------- Co-authored-by: Mario Vega <marioevz@gmail.com> commit 84025b7 Author: Sam Wilson <57262657+SamWilsn@users.noreply.github.com> Date: Tue Feb 24 14:05:19 2026 -0500 chore: bump docc version (ethereum#2303) commit 4fff5eb Author: Mario Vega <marioevz@gmail.com> Date: Tue Feb 24 19:30:35 2026 +0100 feat(test-execute): Support `testing_buildBlockV1` (ethereum#2295) * feat(test-rpc): Add `Testing` namespace * nit: transactions parameter * feat(test-execute): Support `testing_buildBlockV1` * docs * refactor(execute): Testing endpoint non-auth * review comments Co-authored-by: fselmo <fselmo2@gmail.com> --------- Co-authored-by: fselmo <fselmo2@gmail.com> commit 7329095 Author: Carson <carson@ethereum.org> Date: Tue Feb 24 12:29:35 2026 -0500 refactor(test): rename/align gas cost constants (ethereum#2094) - rename base gas cost - rename very low gas cost - rename memory gas cost - rename copy gas cost - rename low gas cost - rename medium gas cost - rename high gas cost - rename warm access gas cost - rename cold access gas cost - rename storage reset gas cost - rename storage clear gas cost - rename storage set gas cost - rename warm account gas cost - rename cold account gas cost - rename initcode word gas cost - rename code deposit gas cost - rename create gas cost - rename call stipen gas cost - rename selfdestruct gas cost - rename call value gas cost - rename new account gas cost - rename exp gas cost - rename exp per byte gas cost - rename keccak256 gas cost - rename keccak256 word gas cost - rename blockhash gas cost - rename log data gas cost - rename log gas cost - rename log topic gas cost - rename tx based gas cost - rename contract creation gas cost - rename access list addr gas cost - rename access list storage gas cost - rename floor calldata gas cost - rename token standard gas cost - rename authorize gas cost - rename authorize base cost - rename data zero gas cost - rename non-zero data gas cost - rename blob gas per block gas cost - rename blob min price gas cost - rename all precompile gas costs - (refactor) TX_ACCESS_LIST_ADDRESS_COST => GAS_TX_ACCESS_LIST_ADDRESS - (refactor) GAS_CODE_DEPOSIT => GAS_CODE_DEPOSIT_PER_BYTE - (refactor) remove unnecessary Uint cast for REFUND_STORAGE_CLEAR - (refactor) TX_ACCESS_LIST_STORAGE_KEY_COST => GAS_TX_ACCESS_LIST_STORAGE_KEY - (refactor) GAS_KECCAK256_WORD => GAS_KECCAK256_PER_WORD - (refactor) TX_BASE_COST => GAS_TX_BASE - (refactor) TX_CREATE_COST => GAS_TX_CREATE - (refactor) TX_DATA_STANDARD_TOKEN_COST => GAS_TX_DATA_TOKEN_STANDARD - (refactor) => TX_DATA_FLOOR_TOKEN_COST => GAS_TX_DATA_TOKEN_FLOOR - (refactor) GAS_INIT_CODE_WORD_COST => GAS_CODE_INIT_PER_WORD - (refactor) TX_DATA_COST_PER_ZERO => GAS_TX_DATA_PER_ZERO - (refactor) TX_DATA_COST_PER_NON_ZERO => GAS_TX_DATA_PER_NON_ZERO - (refactor) GAS_AUTH_PER_EMPTY_ACCOUNT_COST => GAS_AUTH_PER_EMPTY_ACCOUNT - (refactor) GAS_LOG_DATA => GAS_LOG_DATA_PER_BYTE - (refactor) GAS_PRECOMPILE_SHA256_WORD => GAS_PRECOMPILE_SHA256_PER_WORD - (refactor) GAS_PRECOMPILE_RIPEMD160_WORD => GAS_PRECOMPILE_RIPEMD160_PER_WORD - (refactor) GAS_PRECOMPILE_IDENTITY_WORD => GAS_PRECOMPILE_IDENTITY_PER_WORD - (refactor) REFUND_PER_AUTH_BASE_COST => REFUND_AUTH_PER_EXISTING_ACCOUNT --------- Co-authored-by: LouisTsai <q1030176@gmail.com> commit cdf9add Author: Paweł Bylica <pawel@hepcolgum.band> Date: Tue Feb 24 16:53:55 2026 +0100 feat(tests): port EXTCODEHASH dynamicAccountOverwriteEmpty (ethereum#2032) Port the legacy static test: stExtCodeHash/dynamicAccountOverwriteEmpty_Paris for EIP-1052 EXTCODEHASH testing. The test verifies EXTCODEHASH correctly updates when an empty account is overwritten by CREATE2. Modified from original: target account has no storage to avoid EIP-7610 collision behavior. commit 9bd111a Author: felix <felix314159@users.noreply.github.com> Date: Tue Feb 24 14:20:18 2026 +0100 fix: failing test in ci for 7981 (ethereum#2297) commit 6f80f72 Author: Jochem Brouwer <jochembrouwer96@gmail.com> Date: Tue Feb 24 10:25:24 2026 +0100 refactor(test-benchmarks): ensure `SSTORE` N->N writes nonzero key/value (ethereum#2255) commit 04fb242 Author: Paweł Bylica <pawel@hepcolgum.band> Date: Mon Feb 23 23:24:31 2026 +0100 feat(tests): port EXTCODEHASH codeless account with storage test (ethereum#2291) Port sub-case from dynamicAccountOverwriteEmpty_ParisFiller.yml: a codeless account with storage returns keccak256("") for EXTCODEHASH and 0 for EXTCODESIZE. Parametrized over three non-empty variants (balance only, nonce only, both). commit 7add493 Author: Sam Wilson <57262657+SamWilsn@users.noreply.github.com> Date: Mon Feb 23 15:45:25 2026 -0500 chore: expand docstring guide (ethereum#2195) Co-authored-by: danceratopz <danceratopz@gmail.com>
🗒️ Description
This PR implements a solution to avoid the indefinite linear growth of JSON fixture files with new forks by splitting the test cases (fixtures) for a specific fork into different subdirectories. For benchmark releases, they are additionally split by gas-limit, for easier consumption.
fillnow organizes:for_{fork}).for_{fork}_at_{gas}M/).for_{fork}_at_opcount_{count}k/).FixtureOutputand new output-naming helpers (FORK_SUBDIR_PREFIX,format_fork_subdir(),format_gas_limit_subdir()) intoshared/fixture_output.py(shared in preparation for the newbuildcli that usesdebug_BuildBlockV1).blockchain_tests_engine_x/pre_allocshared at the output root and reject--output=stdoutfor gas benchmark runs.releases.mdandbenchmarks.md.Consensus fixtures
Fixtures are now grouped by the fork under test. Below the fork directory, the layout mirrors
./tests/(organized by the fork where functionality was introduced).Before - all forks mixed in a flat structure:
After - separated by target fork and gas limit:
Example Consensus Tests (regular fill)
Command:
Screenshot (note some warm coinbase tests fill for forks >= Berlin):

Benchmark fixtures
Before - all tests parametrized by gas values or opcode counts placed in a single fixture file, no per-value split:
-m benchmark --gas-benchmark-values=x,y,zBenchmark fixtures with
--gas-benchmark-valuesuse the same scheme, with one subdirectory per gas value.After - split by fork and gas value:
Example Gas Benchmark Values
Command:
Screenshot:

-m repricing --fixed-opcode-count=x,y,zAfter - split by fork and opcode count value:
Example Opcode Count
Command:
Screenshot:

🔗 Related Issues or PRs
N/A.
✅ Checklist
toxchecks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:uvx tox -e statictype(scope):.mkdocs servelocally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.Cute Animal Picture